This document details visualization in anvio

Anvio is run in a dedicated environment.

conda activate anvio-7.1

Get bin info

Get bin info into a format that anvio can use. This means concatenating the bin files for each method, so there’s a list of which contig/read goes in which bin

path <- list.dirs("../data/Bins")

for (i in 2:7){
  
  DF <- NULL
  
  pathname <- path[i]
  filelist <- list.files(paste0(pathname, "/"))
  
  for (filename in filelist){
    df <- read.csv(paste0(pathname, "/", filename), header = F)
    df <- as.data.frame(df)
    colnames(df) <- "read"
    df$bin <- str_replace(filename, "[.]", "_")
    if (basename(pathname) == "24_sample_bam_bins"){
      df$bin <- str_replace(df$bin, "24", "twentyfour")
    }
    if (basename(pathname) == "47_sample_bam_bins"){
      df$bin <- str_replace(df$bin, "47", "fortyseven")
    }
    DF <- rbind(DF, df) 
  }
  
  write.table(DF, paste0("../output/all_bins/", basename(pathname), ".tsv"), row.names = F, col.names = F, quote = F, sep = "\t")
  
}

Examine tsv files.

tsv_output <- read.csv("../output/all_bins/assembly_bins.tsv", sep = "\t")
kable(head(tsv_output, 5))
MG1058_s821.ctg000852l assembly_bin_1
MG1058_s1105.ctg001148l assembly_bin_1
MG1058_s1585.ctg001645l assembly_bin_1
MG1058_s1820.ctg001893l assembly_bin_1
MG1058_s645.ctg000674l assembly_bin_10
MG1058_s914.ctg000951l assembly_bin_10

Import bins into anvio

Get the bins into the anvio database already created.

# Example for one bin import, change import and -C for each
anvi-import-collection "./github/jordan-marinimicrobia/output/all_bins/short_reads_bam_bins.tsv" -p "./Downloads/plus_PROFILE.db" -c "./Library/CloudStorage/GoogleDrive-jwinter2@uw.edu/Shared drives/Rocap Lab/Project_ODZ_Marinimicrobia_Jordan/Anvio/assembly_plus/1058_P1_2018_585_0.2um_assembly_plus.db" --contigs-mode -C shortreads

Run interactive browser

anvi-interactive -p "./Downloads/plus_PROFILE.db" -c "./Library/CloudStorage/GoogleDrive-jwinter2@uw.edu/Shared drives/Rocap Lab/Project_ODZ_Marinimicrobia_Jordan/Anvio/assembly_plus/1058_P1_2018_585_0.2um_assembly_plus.db"

Example of what the interactive browser looks like with bins.

Anvio interactive browser

Refine bins

Dig into “contaminated” bins to see how/why they are contaminated. Reminder that “.” is changed to “_” and 24 and 47 are written out in the anvi bin database.

anvi-refine -p "./Downloads/assembly_PROFILE.db" -c "./Library/CloudStorage/GoogleDrive-jwinter2@uw.edu/Shared drives/Rocap Lab/Project_ODZ_Marinimicrobia_Jordan/Anvio/assembly_only/1058_P1_2018_585_0.2um_assembly.db" -C shortreads -b short_reads_bam_bin_163

Example of a contaminated bin.

Anvio interactive display of a contaminated bin

Get summary statistics

summary <- read.table("../output/anvio_outputs/assembly_plus_summary.txt", sep = "\t", header = T)
summary(summary)
##      bins            total_length       num_contigs           N50         
##  Length:274         Min.   :  202581   Min.   :   1.00   Min.   :  10203  
##  Class :character   1st Qu.:  310692   1st Qu.:   9.00   1st Qu.:  12386  
##  Mode  :character   Median :  488296   Median :  23.00   Median :  14620  
##                     Mean   :  879731   Mean   :  52.00   Mean   :  74294  
##                     3rd Qu.:  896170   3rd Qu.:  50.75   3rd Qu.:  50200  
##                     Max.   :20079188   Max.   :1582.00   Max.   :3034959  
##    GC_content    percent_completion percent_redundancy   t_domain        
##  Min.   :26.47   Min.   :  0.00     Min.   :   0.00    Length:274        
##  1st Qu.:38.99   1st Qu.:  0.00     1st Qu.:   0.00    Class :character  
##  Median :45.84   Median :  0.00     Median :   0.00    Mode  :character  
##  Mean   :47.44   Mean   : 13.16     Mean   :  16.19                      
##  3rd Qu.:57.02   3rd Qu.: 23.59     3rd Qu.:   0.00                      
##  Max.   :69.50   Max.   :100.00     Max.   :2053.52                      
##    t_phylum           t_class            t_order            t_family        
##  Length:274         Length:274         Length:274         Length:274        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    t_genus           t_species        
##  Length:274         Length:274        
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
## 

Creating pangenome

Creating anvio dbs for my bins

anvi-gen-contigs-database -f sulf_genomes/assembly_plus_bin_4.fa -o sulfbin4.db

anvi-run-hmms -c sulf_genomes/dbs/sulfbin4.db
anvi-run-scg-taxonomy -c sulf_genomes/dbs/sulfbin4.db
anvi-scan-trnas -c sulf_genomes/dbs/sulfbin4.db
anvi-run-ncbi-cogs -c sulf_genomes/dbs/sulfbin4.db
anvi-run-kegg-kofams -c sulf_genomes/dbs/sulfbin4.db

anvi-gen-genomes-storage -e sulf-external-genomes.txt \
                         -o sulf-GENOMES.db

anvi-pan-genome -g sulf-GENOMES.db -n sulfitobacter

anvi-display-pan -g sulf-GENOMES.db -p sulfitobacter/sulfitobacter-PAN.db

Pangenome visualization.

Sulfitobacter pangenome